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THE CHERNOFF LOWER BOUND FOR SYMMETRIC QUANTUM 

HYPOTHESIS TESTING 

By Michael Nussbaum 1 and Arleta Szkola 2 
Cornell University and Max Planck Society 

We consider symmetric hypothesis testing in quantum statistics, 
where the hypotheses are density operators on a finite-dimensional 
complex Hilbert space, representing states of a finite quantum system. 
We prove a lower bound on the asymptotic rate exponents of Bayesian 
error probabilities. The bound represents a quantum extension of 
the Chernoff bound, which gives the best asymptotically achievable 
error exponent in classical discrimination between two probability 
measures on a finite set. In our framework, the classical result is 
reproduced if the two hypothetic density operators commute. 

Recently, it has been shown elsewhere [Phys. Rev. Lett. 98 (2007) 
160504] that the lower bound is achievable also in the generic quan- 
tum (noncommutative) case. This implies that our result is one part 
of the definitive quantum Chernoff bound. 

1. Introduction. One typical problem in hypothesis testing is to decide 
between two equiprobable hypotheses, say Hq and H±, where Hi assumes 
that the observed data are generated by an i.i.d. process with law Pi, i = 0,1. 
In the classical setting, Pq,P\ are probability measures on a measurable 
space, the sample space. One discriminates between them by means of test 
functions, which are nonnegative measurable functions on the n-fold product 
sample space. An error occurs if, according to the given decision rule based 
on the value of the test function, one accepts hypothesis Hq while the data 
are generated with law P\, or vice versa. 

If one declares one of the hypotheses to be the null hypothesis and the 
other one the alternative, then errors occurring while the null hypothesis is 
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true are called "of first kind," otherwise "of second kind." Due to Stein's 
lemma there exists test functions maintaining a given upper bound a on 
the error probability of first kind, such that the probability of error of the 
second kind decreases to with the optimal asymptotic rate exponent equal 
to the Kullback-Leibler distance from the null hypothesis to the alternative. 
Sanov's theorem extends this result to the case where, instead of a single 
measure Po ; a family Q of measures is associated with the null hypothesis. 
Then, the negative Kullback-Leibler distance from the set £1 to P\ gives the 
minimal asymptotic error exponent ([19], see also [7]). 

In symmetric hypothesis testing one treats the errors of first and second 
kind in a symmetric way. We will focus here on the Bayesian error prob- 
ability, which is the average of the two kinds of error probabilities. It is 
minimized by the likelihood ratio test and vanishes exponentially fast as the 
sample size n tends to infinity. The corresponding optimal asymptotic rate 
exponent is equal to the Chernoff bound 



pertaining to probability measures Pq and P%, with respective densities pq 
and p\ (wrt dominating measure fj, = Pq + Pi). These results go back to 
papers by Chernoff and Hoeffding [6, 12]. Chentsov and Morozova [5] present 
a thorough and illuminating discussion of the Chernoff bound, relating it to 
the differential geometry of statistical inference. 

If the data are obtained from quantum systems, then one has to replace 
probability measures by quantum states, that is, by normalized positive 
linear functionals on an appropriate algebra of observables. In the present 
paper, this is assumed to be the algebra of linear operators on a finite- 
dimensional complex Hilbert space. One discriminates between two states 
Pq and p\ by means of quantum tests, which are defined as positive opera- 
tor valued measures on n-fold tensor products of the algebra of observables 
of a single quantum system. Here, we employed the standard language of 
quantum mechanics; throughout the paper, however, we will utilize an el- 
ementary and accessible mathematical framework based on complex linear 
algebra only. It will become apparent that quantum tests are analogs of test 
functions defined on finite sample spaces and their n-fold products. 

While the basic problems in nonsymmetric quantum hypothesis testing 
(pertaining to a-tests) were solved in [11, 18] and [3] by obtaining quantum 
versions of Stein's lemma and Sanov's theorem, the case of discrimination (or 
equally weighted hypotheses) has not yet received full treatment. Although 
quantum tests minimizing the generalized Bayesian error probabilities were 
constructed about 30 years ago by Helstrom and Holevo [10, 13], a closed 
form expression for the optimal asymptotic quantum error exponent similar 
to the classical Chernoff distance remained an open problem. A reason is 
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that there is no obvious canonical way to extend (1) to a quantum setting. 
On the very formal level, due to noncommutativity effects, there are different 
nonequivalent ways of generalizing the distance. In [18] , Ogawa and Hayashi 
list three candidates for the optimal quantum rate exponent, relying on 
three different extensions of the target function in the variational formula 
(1). However, two of these candidate expressions are not well defined if the 
hypotheses are not faithful states, that is, if the associated density operators 
do not have full rank. 

Recently, the problem of symmetric quantum testing was treated by Kar- 
gin [14], with partial progress toward the definitive Chernoff bound. Lower 
and upper bounds on the optimal error exponent in terms of fidelity between 
the two density operators were given; the lower bound was shown to be sharp 
in the case that one of the density operators has rank one (i.e., represents a 
pure quantum state) . We remark that fidelity is a notion of distinguishability 
between density operators which is frequently used in quantum information 
theory (see, e.g., [8, 16]). 

Our main result, which we formulate rigorously in Section 2, states that 
info<s<i logTr[/)Q _s pf] is a lower bound on the general asymptotic error ex- 
ponent, po and pi being density operators replacing the probability densities 
Po and pi of the classical setting. We remark that our quantum bound co- 
incides with one of the three candidates for a quantum Chernoff bound 
discussed in [18]. We prove the main theorem in Section 3. Recently, Aude- 
naert et al. have shown in [1] that in accordance with our conjecture stated 
in a previous version of the present work, [17], the lower bound is indeed 
achievable. This justifies referring to it as the quantum Chernoff bound. 

2. Mathematical setting and the main theorem. For an elementary in- 
troduction to quantum statistics with physical background, see Gill [9]. We 
will describe here only the formalism for the simplest possible nonclassi- 
cal setup of discrimination between two hypotheses. A density matrix p is 
a complex, self-adjoint, positive d x d matrix satisfying the normalization 
condition Tr[p] = 1, where Tr[-] is the trace operation. Here "positive" means 
nonnegative definite. We identify a density matrix with a state of a quan- 
tum system; we also use "matrix" and "operator" interchangeably. The two 
hypotheses are described by two states, Hq : p = po and H\ \ p = p\. 

Physically discriminating between them corresponds to performing a mea- 
surement on the quantum system. Mathematically a measurement with k 
possible outcomes is associated to a set of positive d x d matrices {n, . . . , r^} 
adding up to the unit matrix. When the state is p then the probability of 
the ith outcome is Tr[pr,,]. In analogy to classical hypothesis testing one 
accepts Ho or H\ according to a decision rule based on the outcome of 
a measurement. In this case, there are k = 2 possible outcomes and any 
appropriate measurement may be written {1 — r, r}, where r is a complex 
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self-adjoint positive matrix satisfying the inequality < r < 1. Here, 1 is the 
unit matrix and < is in the sense of matrix order, that is, 1 — r is positive 
(nonnegative definite). We will mostly make reference to this measurement 
by its r element, the one corresponding to the alternative hypothesis. Then, 
Tr[pr] is the overall probability of rejecting Hq when p is the true state. 
Accordingly, Trfpor] is the error probability of first kind and Tr[(l — r)pi] = 
1 — Tr[/9ir] is the error probability of second kind. When both po, p\ and 
also r are diagonal matrices, then the setup reduces to the classical testing 
problem for two probability measures on an appropriate index set Q, = d, 
given by po, pi, respectively. The same is true when po, p\ have the same 
set of eigenvectors; then, p§, p\ are said to commute (commutative case). In 
this sense, commuting states describe the classical discrimination problem 
between two probability measures on a finite sample space H, as a special 
case of the present quantum setting. 

A pure state is given by a density matrix which has rank 1 , which means it 
is a projection onto a subspace of (complex) dimension one. We will also use 
the following notation: we set TL = C d , with the understanding that TL can 
be any cf-dimensional complex Hilbert space, and we write B(TL), B(TL® n ) 
for the set of complex d x d or d n x d n matrices, respectively. In the bra-ket 
notation, \v) and {v\ denote a vector in TL and its dual vector with respect 
to the scalar product in TL (essentially a column and a row vector). A one- 
dimensional projection onto a subspace of TL, spanned by a unit vector v, 
may be written as It is a density operator of a pure state. 

The above describes the basic setup where the finite dimension d is arbi- 
trary. We consider the quantum analog of having n i.i.d. observations. For 
this, the two hypotheses are assumed to be /3® n and pf n for two basic d- 
dimensional states po, p±, where p® n is the n-fold tensor product of p with 
itself. (Recall that the tensor product a (g> b of two matrices is a matrix which 
consists of blocks aijb, arranged according to the indices i,j. Thus, p® n is 
a d n x d n matrix.) The tests r n now operate on the states /?® n and pf n , 
that is, their dimension is d n x d n , but they need not have tensor product 
structure. The corresponding Bayesian error probability is 

Err(r n ) : = ±Tr[(r n p<f + (1 - r n )pf n )} 

= l(l-TT[r n (pf n -p^)]). 

The optimal hypothesis tests minimizing the error probability are known to 
be the Holevo-Helstrom hypothesis tests [10, 13]. They are given for each 
n G N by the projections 

n; : = sup P ( P r -pT)+, 

where supp a denotes the support projection of a linear operator a and a_|_ 
means the positive part of a self-adjoint operator a. Thus, if a = J2 i \{Ei 
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is the spectral decomposition using projections £7j, then a + '-=Y^\>o\^i 
and suppa+ = J2\ >oEi- Indeed, we have for an arbitrary test operator in 

Err(r n ) = \{1 — Tr[r n (pf n — p® ™)D 

> ±(1 - sup{Tr[f (pf n - p® n )} : r G £(?^ n ) test}) 

= i(l - sup{Tr[n(pf n - pf n )] : n G i3(ft® n ) projection}) 

= |(1 - Tr[U* n ( P f n ~ PT)}) = U 1 - 1 2\\pT- pT 111). 

where ||a||i = Tr[a+] +Tr[a + — a] is the generalization of the Li-norm. Note 
that the last line above gives an exact closed form expression of the best error 
probability for every n, but its asymptotics as n — > oo (rate of exponential 
decay) is the subject of the present paper. 

The Holevo-Helstrom tests II* are noncommutative generalizations of the 
likelihood ratio tests: if the hypotheses Hq and H\ correspond to commut- 
ing density operators po and pi, then, for all n G N, the Holevo-Helstrom 
projections n* commute with p® n and pf n , also. The density operators pj 
may be completely specified by their eigenvalues forming discrete proba- 
bility measures Pi, i = 0, 1, on an appropriate index set Q, |f2| = d for the 
mutually commuting spectral projectors on TL. For each n G N, the set of 
eigenvalues of the tensor product pf n , i = 0, 1, corresponds to the respec- 
tive product measure P" := rij=i Pi on the Cartesian product $7™ := X " =1 fi 
while the Holevo-Helstrom projection n* generalizes the indicator function 
A* = l{pi — Pq > 0} on Q n . Here, pf denote the probability densities of the 
product measures P". We note that A* is the well-known maximum likeli- 
hood decision. It takes the value 1, which corresponds to a decision in favor 
of H\, on samples x G VL n for which the density value (or likelihood) Pi(x) 
is larger than Pq{x). 

The classical Bayesian error probability Err(A), of a test function A (0 < 
A < 1), is defined by 

(2) Err (A) : = \ (E^X + E Pl (1 - A)) 

where Ep stands for expectation under the law P. The quantity Err(A) 
averages over both possible sources of error with equal weights 1/2. In the 
more general situation, the weights are specified by the a priori probabilities 
(•7To,7Ti) for Hq or Hi to occur, that is, Err(A) := ttqEp q \ + iriEp 1 (l — A). 

As already mentioned in the Introduction, the Bayesian error probability 
Err (A* ) vanishes, as n — > oo, with a minimal asymptotic rate exponent equal 
to the Chernoff bound 5(Po,Pi): 

(3) lim ilogErr(A;) = ,5(Po,Pi):= inf log^^- s (xM(x). 
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We remark that 

(4) ^pJ- s (^(,)=:A( S ), S G[0,1], 

represent the normalization factors of the parametric family of probability 
measures 

The family is called a Hettinger arc in the literature. It interpolates between 
Po and pi if their supports Dq,Di C $7 coincide. Otherwise, p s , s € [0,1], 
is discontinuous (in the Euclidian metric of IR^I) at the endpoints s = 0, 1 
such that over the open parameter interval (0, 1) it represents an interpola- 
tion between the densities of the conditional probabilities Qo := Pq(-\B) and 
Q 1 --P X (.\B), where B:=D Q f\D x . 

There is an equivalent expression for the Chernoff bound (3) in terms of 
the KL-distance (relative entropy): 

(5) S(P ,P 1 ) = inf (-(l-s)if(Q s ||Q )-siir(Q s ||Qi)+log7r 1 - s 7rf), 

s£[0,l] 

where Q s denotes the conditional probability P S (-\B), for s 6 [0,1], and 
7Tj := Pi(B), for i = 0, 1. Observe that if the supports Dq and D\ coincide, 
that is, B = Q, then the target function in (5) — we will refer to it as H{s) 
in the sequel — becomes simply —(1 — s)K(P s \\Pq) — sK(P s \\Pi). What is 
remarkable is that in this case we have 

6{P , Pi) = -K(P a \\P ) = -KiPjP^, 

where the parameter a £ [0, 1] is uniquely defined by the second equality 
above. In the generic case of possibly different supports, a modified version 
of the above formula is valid. One distinguishes two cases: if there exists a a € 
(0, 1) such that H'(a) = 0, which is equivalent to K(Q a \\Q ) - K{Q a \\Q{) = 
log(7r /7Ti), then 

S(P ,Pi) = -K{Q a \\P Q ) + lo g 7r = -tf(Q CT ||Pi) + login. 

Otherwise, the infimum in (5) is attained either at s = or at s = 1, and the 
corresponding values of the Chernoff bound are log7ro and log7Ti. 

The identity (5) and the other claims in the above paragraph follow from 
(23) in the Appendix and attendant reasoning. To our knowledge, no quan- 
tum generalization of (5) has yet been found. 

In the following theorem we formulate the classical result (3) for the gen- 
eral case of probability measures Pq, P± on an arbitrary measurable space 
(f2, £), not necessarily finite. Consider the Bayesian error probability of dis- 
crimination between Pq,Pi by means of test functions < A < 1: 



(6) 



A(P ,Pi):= inf Err(A) 

A test function 
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where Err (A) is given by (2). Let A* be the maximum likelihood test function 
A* = l{pi — po > 0} on Q, in terms of densities po, p±, for some dominating 
measure p. It is well known that A(Pq,P\) can be expressed as 

(7) A(P ,Pi) = Err(A*) = \ J mm( Po , Pl ) dp. 

Theorem 2.1. Let Po,Pi be two probability measures on (0,S). For 
product measures Pq , P™ corresponding to n i.i.d. observations ui, . . . ,u) n , 
all having law Po or P\ , the Bayesian error probability satisfies 

(8) Jim rT 1 log A(P^Pf) = Q mf ^og J p{pl~ s dpi, 

where pi = dPi/dfi, i = 0, 1, fi:= Po + Pi. 

For strictly positive po and p\ with po ^ pi, the proof can be found in the 
literature (cf., e.g., [5], page 164, or for finite sample space [7], page 312). For 
completeness, we present a proof for the general case of possibly different 
support of Po,Pi in the Appendix. Indeed, if Po,Pi have the same support, 
then the function A(s) = J PiPq~ s d[i is analytic and strictly convex, hence a 
minimizer a € [0, 1] of A(s) exists, and the infimum is, in fact, a minimum. 
However, if the supports are different, then A(s) may be discontinuous at the 
endpoints of the interval [0,1]. Hence, a minimizer need not exist, and the 
r.h.s. in (8) is only an infimum. The proof of our main theorem, Theorem 
2.2 below, uses the above classical result for the general case of possibly 
different support. 

We intend to investigate the asymptotic behavior of the Bayesian error 
probability in the case where the hypotheses are quantum states on B(TC), 
where dim7Y = d < oo. In order to derive the optimal asymptotic rate ex- 
ponent, we replace the target function in the variational formula (3) or (8), 
which defines the classical Chernoff bound, by 

A(s):=Tr\ffc-'fi], a £[0,1]. 

Our main theorem, formulated below, confirms that the logarithm of the 
infimum of A(s) over [0, 1] gives a lower bound on the optimal quantum 
error exponent. 

Theorem 2.2 (Quantum Chernoff lower bound). Let po,pi be two den- 
sity operators representing quantum states on a finite- dimensional complex 
Hilbert space Ti. Then, any sequence of test projections H n € B{7i® n ), n £ N, 
satisfies 

(9) ~ log Err ( n «) - 4 n f x lo § Tr [pl~ s p{\ ■ 
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We point out that, indeed, A(s) represents the proper generalization of 
(4) in the context of symmetric hypothesis testing. As already noted in the 
Introduction, and as conjectured in [17], it turns out to be achievable (see 
[I])- 

It is of interest to evaluate the quantum Chernoff bound for special cases 
and to investigate its properties as a distinguishability measure for quan- 
tum states. In the classical case, it is well known that if Pj are normal 
laws N(fii,a 2 ), then the r.h.s. of (8) is (/ii — po)/8a 2 . The discussion of 
quantum Gaussian states would require admitting an infinite-dimensional 
complex Hilbert space 7i, and, thus it is outside the scope of our paper. It 
can be conjectured, however, that our method of proof readily generalizes 
to an infinite-dimensional setting. Anticipating such a generalized bound, 
Calsamiglia et al. [4] have recently written down the appropriate analog of 
the r.h.s. of (9) for Gaussian states and evaluated it for various examples of 
Gaussian states of light (cf. also [15]). Another discussion of the geometric 
properties of the quantum Chernoff bound and a derivation of the related 
quantum Hoeffding bound can be found in [2]. 

3. Proof of the main theorem. We will prove Theorem 2.2, applying 
the corresponding classical result, Theorem 2.1, to appropriate probability 
distributions appearing in the general noncommutative setting. 

Proof of Theorem 2.2. We will establish 

liminf ilog(Err(n n )) > ^nf \ogTi[pl~ s p{], 

for any sequence of projections Il n € B{7i® n ), n £ N. 

We consider two arbitrary density operators po , p± on a finite-dimensional 
Hilbert space H = C d with spectral representations 

d d 

pa = ^2K\xi)(xi\, pi = ^2 n /i\yi)(yi\, 
i=i i=i 

that is, \xi), i = l,...,d, and \yi), i = 1, . . . ,d, are two orthonormal bases 
(ONB) of eigenvectors in C , and Aj,7j G [0, 1] are the respective eigenvalues 
of po and p\. 

Let II be a projection onto a subspace of C d , then 

d 

= ^2Xi{xi\Ilxi) 
i=i 

d d d 

= ^A J ||n^|| 2 = ^A^|(nx J |y J )| 2 , 

i=l i=l j=l 



Ti[Up] = Tr 




Xi) {Xi 
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where the third identity is true since II is a projection, and the last one is 
by Parseval's identity for the ONB \y 3 ), j = 1, ... ,d. In the same way, we 
obtain 

Tr[(l - n) Pl ] = £ li El ((! " n )^' I'' 

j=l i=l 

Now, in view of the identity |((1 — H)yj\xi)\ 2 = |((1 — Ti)xi\yj)\ 2 , we have 
Err(n) = i(Tr[p n] +Tr[ Pl (l -n)]) 

= i EW( n ^i)l 2 + 7iK(i-nH%-}| 2 ). 

Denote o = (Hxi\yj) and 6 = ((1 — n)xi|j/j). Since for any complex a, b the 
inequality |a| 2 + |6| 2 >|a + 6| 2 /2 holds, we obtain from the last display 

d 

(10) Err(II)> J2 \^{\,lM x i\Vj)\ 2 ■ 

i,j=l 

Note that 

(11) pij := Xi\{xi\yj)\ 2 , qij := 7 i |(x i |y j }| 2 , i,j = l,...,d, 
define probability measures P and Q on d 2 elements, respectively. Indeed, 

d d d d 

i,j=l i,j=l i—1 i=l 

and similarly for (q%j)- Now, inequality (10) may be written 

d 

(12) Err(n)>i ]T min{ Pi)i , 

Observe, according to (6) and (7), the r.h.s. above is up to the factor 1/2 
equal to the classical minimal Bayesian error probability A(P, Q) of discrim- 
ination between probability measures P and Q: 

d 

(13) \ J2 ram{p ld ,q iij } = A(P,Q). 

i,j=l 

Next, we consider the case where the quantum hypotheses are p® n and 
pf n . Then, the corresponding classical probability measures according to 
(11) are product measures P n and Q n , for P,Q corresponding to po,pi, 
respectively. Applying inequality (12), (13) and subsequently combining it 
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with the classical result on the Chernoff bound for A(P n , Q n ), Theorem 2.1, 
we obtain for any sequence of projections Tl n 6 B(Ti® n ), n £ N, 

liminf-logErr(II n ) > lim - logf -A(P n , Q n ) ) 

n— >oo n n^oo n \2 J 

\ 1,3=1 / 

We finish the proof by verifying 

E ">y<ih E \-~'-tfltelw>l a = E \-~'<*ilw>-tf<vite> 

ij'=l ij'=l t,3=l 



Tr 



d 

1—s „si 



E A i *|a?*><a5*l lj\yj){yj 
■M=l 

APPENDIX 



TiK-Vil- 



□ 



As announced in Section 2, we give a proof for Theorem 2.1 for the gen- 
eral case where the two probability measures involved are allowed to have 
different supports. As far as possible, we follow the proof in the case of same 
support by Chentsov and Morozova [5]. 

Proof of Theorem 2.1. 1. Preliminary observations. Assume that 
two probability measures Pq, P\ on a measurable space (fi, E) have support 
Di = supp(Pj), i = 0, 1. Denote B = D\ n D2, and for i = 0, 1, 

(14) Si = Di\B. 

We introduce the measure fx = Pq + Pi and define the densities pi = dPi/dfi, 
i = 0,1. Then, clearly py + P2 = 1. We assume the densities and the sets Di 
are chosen such that 

Di=u:pi(u)>0, £ = 0,1, 

hence 

B = uj:p (uj) > 0,pi(u) > 0. 
Recall the definition of the Hellinger arc of densities for parameter s£ [0,1]: 

p,(u) =p s l {u;)pl~ s (uj)A~ 1 (s), 

where 



QUANTUM CHERNOFF BOUND 11 

is a normalizing factor. Note that for s = and s = 1, we obtain the initial 
densities po, p\ respectively, so that ^4(0) = A(l) = 1. However, the function 
A(s) is not continuous in general at the endpoints 0, 1. Indeed, the integral 
is over the set B, 

A( s ) = I P s i(uj)p^ s (u;)fi(duj), 
Jb 

and by dominated convergence it follows that 

A+(0) := lim A(s) = [ po(«)M<&>) = M B ), 
s\o Jb 

A_(l) := lim A(s)= f pi(u)fj,(dw) = Pi(B). 
s/i Jb 

Furthermore, observe that for s G (0, 1) the densities p s have support B, 
with limits at the endpoints 

Po+M =poH/P (B), Pl -(u) =p 1 (u)/P 1 (B). 

Hence, the corresponding limiting measures are the conditional probability 
measures 

P 0+ (-) = P (-\B), P 1 _(.) = P L (-|5). 

If the sample space is restricted to B, the densities p s , s £ (0,1), can be 
written in exponential family form: 

(15) Ps ^) =ex J s l og Pl^l\ po ( UJ )A~ 1 ( s ), ueB. 

For s = 0, 1, the above holds if B = D s . Also, for s = 0, 1, if B ^ D s , then 
the restriction p s \B is not a probability density. We denote 

H(s) =log A(s), H+{0)=logP (B), F_(l)=logP 1 (S). 

2. Bayesian error probabilities Err(AJi) by change of measure to P s . Recall 
the form of the optimal test A* on f2 n for equiprobable hypothetic densities 
Po and p\ on $7: 

{n n ~\ 

j=l j=l ) 

where uj±, . . . ,u n are n i.i.d. observations. (One may also take ">" or decide 
arbitrarily on the "=" set.) We partition the set Q n into disjoint subsets 
So,n, Si,n and B n : 

So, n '■= {there is j G {1, . . . , n} such that Uj £ So}, 

Si >n := {there is j G {1, . . . , re} such that Uj G Si}, 
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where Si, i = 0, 1, were defined in (14). The remaining case is the event 
B n := {uj n G Q : u)j G B for j = 1, . . . , n}. 

Denote u; n = (cji, . . . , uj n ) G VL n . We have A* (w n ) = 1 (decision in favor of Pi) 
if L0 n G Si^ n , that is, an event happens which excludes Po- Similarly, we have 
A* (ui n ) = for oj n G So,n- For oj n G P n , define the (normed) log-likelihood 
ratio by 

L n (u;") -rT 1 Vlog^). 
Then, we can describe the test A* 

(16) X* n {u n ) = l{L n (uJ n ) > 0, co n G P n } + IK G 5i, n }. 

Further, we define, for i = 0, 1, functions 

n 

G» = IK G Pjn^ 1 2 log 



j=1 Ps 



We note the following relations, for co G B: 



(17) log^H = -slog^H + P( S ), 

Ps Po 

(18) log — (w) = (1 - s) log^M + P(s). 

PO 

To prove (18), observe that 

i Pi 1 PlM S ) l Pi i Pi l E7Y A 

log — = log — = log s log — + H(s) 

p s exp(slogpi/po)Po Po Po 

= (l_ s )l og Pi + #( s ). 
Po 

Furthermore, it holds 

1 PO , PO^(S) , Pi . TTl \ 

log — = log — -— — = -s log h F(s), 

p s exp(slogpi/po)Po Po 

which implies (17). As a consequence of (17) and (18), we have, for oj n G B n , 

(19) G<$(u n ) = -sL n (u; n ) + H(s), 

(20) GW(u n ) = (l-s)L n (u n ) + H(s). 

In the sequel, we write E s for expectation under the density p s and denote 
by P™ the expectation under the product density for the respective basic 
density p s . Notice that the test A* necessarily decides correctly if oj n G B^ = 
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So )T i U Si >n . Thus, the minimal Bayesian error probabilities can be expressed, 
for any s G (0, 1), as 



Err(A;) = E%\* n + - A*) = E%l Bn \* n + E?l Bn (l - A*) 

(21) 

= ^ n A; exp(nG(°) ) + E%(1 - X* n ) exp(nG« ) 
= E™\* n ex.p(—nsL n + nH(s)) 

+££(1 - \* n ) exp(n(l - s)L n + nH{s)) 

(22) 

= exp{nH(s)){E^(X* n exp(-nsL n ) + (1 - A£)exp(n(l - s)L n ))}. 

3. Upper risk bound. From the expression (16) for A* , we see that, for all 
u n G B n , 

A* exp(-nsL n ) + (1 - A*)exp(n(l - s)L n ) < 1, 
so that (22) implies, for all n G N, 

Err(A* ) < eMnH(s)) 

and, hence, 

-logErr(A^) <H(s). 
n 

Since s G (0, 1) was arbitrary, and since the bounds H(0) = H(l) = are 
trivial, we obtain 

-logErr(A;)< inf H(s). 
n o<s<i 

4. Convexity of H(s) on (0,1). Using the exponential family expression 
(15) for densities p s , the function H(s) may be written for s G (0, 1), 

(23) H(s)=log[ expfslog^^W^H" 

follows 
H'(s) 



It follows 

A '( s ) _ Js 1 ogPi(w)/po(w)exp(slogpi(o;)/po(w))po(w)rf / u(a;) 



A(s) A(s) 

where the fact that A(s) can be differentiated under the integral sign, and 
the integral is finite for all s G (0, 1), is from the basic theory of exponential 
families. In the sequel, we identify expectation under p s and its restriction 
p s \B for s G (0, 1). We can thus write (for a random variable u taking values 
in B) 

(24) H'(s) = E s log = E s log P -^\ - E s log ^M. 

Po(lo) po{u!) pi(u) 
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For the second derivative, we obtain 
A»(»)A(») - (A'{s) f 



H " (S) - A>(.) 

/(logpi(w)/po(w)) 2 exp(slogpi(w)/p (w))po(w)^(w) (H'(s)f 



A(s) 



since the last expression is the variance of the random variable log(pi/po)(w) 
under p s . Thus, -ff(s) is convex on (0, 1). There are two cases. 

Case 1. There is some s G (0, 1) such that H"(s) = 0. Then, log(pi/po)(uj) 
is constant P s -almost surely. Since all P s , s€ (0,1), dominate each 
other, (pi/po)(uj) is also constant P s -almost surely, for all s G (0,1) 
and H"{s) = for all these s. Hence, H(s) is linear on (0, 1). Fur- 
thermore, each P s , s G (0,1), dominates \x on B (i.e., dominates 
p\B). It follows 

— [ui) = c, /i-a.s. on B, 
Po 

for some constant c> 0. In that case, 



Pi(B)= [ cdP = cP (B) 

JB 



and 

Pi{B) 



c ■ 



Po(B) 

This implies 

P (-\B)=P 1 (-\B) = P S , «6(0,1), 

(25) A(s) = (^(B)) 1 -^^)) 8 , 8 G (0, 1). 

Case 2. For all s G (0, 1), we have H"{s) > 0. Then, H(s) is strictly convex 
on (0,1). 

5. Lower risk bound. Since, according to (24), for arbitrary s G (0,1), 

H'(s) = E s log^(u J ), 
Po 

we have in view of (19) and (20), for each n G N, 

E^ n = -sH'(s)+H( S )=: 10 (s), 
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(i) 

Since G s ; n is an i.i.d. average, we have by the Law of Large Numbers, as n 
tends to infinity, 

Gf^ n )-7o(*), Gg(u;")->7i00, 
almost surely under P s . Let 6, n > be arbitrary and consider the subsets 

U n := {oJ n : G% - 7i (a) >-n,i = 0, 1}, n G N. 
Then, again by the Law of Large Numbers, there is an ns G N such that 
P?{U n ) >1-S for all n > n s . 

Starting with identity (21), we estimate the minimal error probability for 

n>n$: 

Err(X* n ) = E n s X* n exp(nG(°) ) + £™(1 - A*) exp(nG« ) 

> J E s n l{f/ n }(A;exp(n 7 o(s) - nrj) + (1 - \* n ) exp(n 7l (s) - tit/)) 

> £ , s n l{C/ n }exp(nmin( 7o (s), 7 i(s)) - nrj) 

> (1 — 8) exp(nmin( 7 o(s), 7 i(s)) — nrj). 
Consequently, we have, for any sequence of test functions A n , n G N, 

liminf n" 1 logErr(A„) > min( 7 o(s), 7 i(s)) —r>. 

n — >oo 

Since n was arbitrary, we obtain for any s G (0, 1) 

liminf n _1 logErr(A n ) > min( 7 o(s), 7 i(s)), 

and, hence, 

liminf n~ 1 log Err (X n ) > sup min( 7 o(s), 7 i(s)). 

n ^°° 0<s<l 

It remains to show that 

(26) sup min( 7o (s), 7 i(s)) > inf H(s). 

0<s<l 0<s<l 

Recall that the values H'(s) are well defined for s G (0, 1) and that H(s) is 
convex in that domain. Hence, there exist limits 

H' + (0) = limtf'( S ), H'_(l) = lhnH'( s ). 

Observe that the limits are possibly infinite. However, due to convexity, only 
H' + (0) = — oo or H' + (l) = oo may occur. 

Again, in view of the convexity of H(s) on (0, 1), the following cases may 
occur: 

(a) H' (0)<0, Fi(l)>0, 
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(b) H' + (0)<0, H'_(1)<0, 

(c) #4(0) >0, H!_(1)>0, 

(d) H' + (0) >0, Hi (1) <0. 

Case (a). In this case, H cannot be linear, so that due to the above 
discussion in 4 (involving Cases 1 and 2) it is strictly convex in (0, 1). Hence, 
there is a unique minimum of H on [0, 1] at some a E (0, 1) with H'(o) = 0. 
We have 

To (c) =7i(c) = H(a), 

hence, 

sup min(7o(s),7i(s)) > H(cr) = inf H(s). 

0<s<l 0<s<l 

Case (b). Again, due to convexity, the infimum of H on [0, 1] is attained 
(uniquely) at s / 1: 

inf H(s) = lim il(s) = H-(l). 

0<s<l s /l 

Now, for s E (0, 1) we have H'(s) < 0, and, hence, 

7o(*) = -sH'(s) + H(s) > H(s) > (1 - s)#'( S ) + £T(a) = 71(a), 



which implies 

sup min(7o(s),7i(s)) > sup 71 (s) > lim sup 71 (s) 

0<s<l 0<s<l s/\ 

>H-(1)= inf H(s). 

0<s<l 

Case (c). This is symmetric to case (b). We obtain 

inf H(s) = H+(0) 

0<s<l 



and 



sup min(7o (s), 71 (a)) > H+ (0) = inf H(s). 

0<s<l 0<s<l 



Now, for s E (0, 1), we have H'(s) > 0, and, hence, 

7l (s) = (1 - s )fT'( a ) + H(s) > H(s) > -sH'(s) + H(s) = 7o (s) 
which implies 

sup min(7o(s),7i(s)) > sup 70 (s) > lim sup 70 (s) 

0<s<l 0<s<l s\0 

>F+(0)= inf H(s). 

0<s<l 
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Case (d). Due to convexity, we must have H',(0) = H'_(l) = 0; then, H(s) 
is constant on (0,1). By (25), we then have Pq(B) = P\(B) and 

H(a) = log P (B) = log Pi(S), a €(0,1). 

Consequently, 

7o(s)= 11 (s) = H(s)= inf H(s), 

0<s<l 

and we obtain trivially 

sup min(7o(s),7i(s)) > inf H(s). 
0<s<l 0<s<l 

We have verified inequality (26) in all cases (a)-(d). Hence, for any sequence 
of test functions A n on Q n , n € N, we have 

liminf n" 1 logErr(Ar,) > liminf n" 1 loffErr(A*) > inf H(s). 

n^oo v ' ra— >oo v ,u 0<s<l 

The upper and lower bounds together complete the proof. □ 
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